channeldb: fix race condition in link node pruning #10462

ziggie1984 · 2025-12-22T08:56:24Z

This commit fixes a critical race condition in MarkChanFullyClosed and
pruneLinkNode where link nodes could be incorrectly deleted despite
having pending channel.

The race occurred because the check for open channels and the link node
deletion happened in separate database transactions:

Thread A: TX1 checks open channels → [] (empty)
Thread A: TX1 commits
Thread B: Opens new channel with same peer
Thread A: TX2 deletes link node (using stale data)
Result: Link node deleted despite pending channel existing

This creates a TOCTOU (time-of-check to time-of-use) vulnerability where
database state changes between reading the channel count and deleting
the node.

Fix for MarkChanFullyClosed:

Move link node deletion into the same transaction as the channel
closing check, making the check-and-delete operation atomic

Fix for pruneLinkNode:

Add double-check within the write transaction to verify no channels
were opened since the caller's initial check
Maintains performance by keeping early return for common case
Prevents deletion if channels exist at delete time

This ensures the invariant: "link node exists iff channels exist"
is never violated, preventing database corruption and potential
connection issues.

gemini-code-assist · 2025-12-22T08:56:37Z

Summary of Changes

Hello @ziggie1984, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical race condition within the channeldb package that could result in the erroneous deletion of link nodes, even when active or pending channels existed. By making channel state checks and link node pruning operations atomic or by introducing robust double-checks within database transactions, the changes ensure data integrity and prevent potential connection disruptions caused by stale database states.

Highlights

Race Condition Fix: Addressed a critical race condition in MarkChanFullyClosed and pruneLinkNode that could lead to incorrect deletion of link nodes, causing database corruption and connection issues. This resolves a Time-of-Check to Time-of-Use (TOCTOU) vulnerability.
Atomic Deletion in MarkChanFullyClosed: The link node deletion logic within MarkChanFullyClosed has been moved into the same database transaction as the channel closing check, ensuring atomicity and preventing race conditions during channel closure.
Double-Check in pruneLinkNode: Implemented a robust double-check within the pruneLinkNode's write transaction. This verifies that no new channels were opened since the initial check, preventing premature link node deletion even if concurrent operations occur.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly identifies and fixes a critical TOCTOU race condition in MarkChanFullyClosed and pruneLinkNode. By moving the link node existence check and deletion into the same atomic database transaction, it prevents a state where a link node could be deleted despite a new channel being opened. The fix in MarkChanFullyClosed is direct and effective. The fix in pruneLinkNode correctly uses a double-check pattern to maintain performance while ensuring correctness. I have one suggestion to improve the implementation of pruneLinkNode for better code clarity and consistency.

channeldb/db.go

ziggie1984 · 2025-12-22T09:05:52Z

Happened out there in the wild, was reported by a user:

2025-12-21 21:09:07.723 [INF] PEER: Peer(XXX): Local close channel request is going to be delivered to the peer
2025-12-21 21:09:07.729 [INF] PEER: Peer(XXX): Delivery addr for channel close: XXX
2025-12-21 21:14:56.725 [INF] FNDG: Initiating fundingRequest(local_amt=XXX BTC (subtract_fees=false), push_amt=0 mSAT, chain_hash=XX, peer=XXX, min_confs=XXX)
2025-12-21 21:15:25.451 [INF] CHDB: Pruning link node XXX with zero open channels from database

So although a Pending OpenChannel flow was going on in the background we deleted the node from the db.

wildsats · 2025-12-22T18:16:40Z

My node was impacted by this bug. We had to ask the peer to force close the channel and then used chantools to sweep the funds.

Roasbeef · 2025-12-23T01:55:16Z

channeldb/db.go

-	// Decide whether we want to remove the link node, based upon the number
-	// of still open channels.
-	return c.pruneLinkNode(openChannels, pruneLinkNode)
+		err = deleteLinkNode(tx, remotePub)


Yup, we should have been doing this in the same db transaction the entire time.

Do we need any sort of reconciliation logic on start up to handle instances that might've happened in the wild? So sanity check that for each open channel, we have a link node present.

added a fix at startup, not sure if it is worth it but I think it does not cost much either and can be removed when moving to sql native ?

not sure if it is worth it

My thinking is that there may be nodes in an anomalous state that have a channel, but not an active LinkNode.

Eg, on start up, we use this to determine who we ned to connect out to:

lnd/server.go

Lines 3489 to 3495 in 801de79

// Iterate through the list of LinkNodes to find addresses we should

// attempt to connect to based on our set of previous connections. Set

// the reconnection port to the default peer port.

linkNodes, err := s.chanStateDB.LinkNodeDB().FetchAllLinkNodes()

if err != nil && !errors.Is(err, channeldb.ErrLinkNodesNotFound) {

return fmt.Errorf("failed to fetch all link nodes: %w", err)

}

ahh you are right did not see that. But this is now fixed with the repair function in the beginning of the server startup.

yyforyongyu

🛠️

yyforyongyu · 2025-12-24T07:16:47Z

channeldb/db.go

+			"from database",
+			remotePub.SerializeCompressed())

-	return c.linkNodeDB.DeleteLinkNode(remotePub)


Guess this DeleteLinkNode can now be deleted?

it is still used for tests so that we can add and remove link nodes, I think we can still keep it.

channeldb/db.go

lightninglabs-deploy · 2025-12-31T10:27:11Z

@ziggie1984, remember to re-request review from reviewers when ready

Roasbeef · 2026-01-05T22:40:26Z

channeldb/db.go

-	// Decide whether we want to remove the link node, based upon the number
-	// of still open channels.
-	return c.pruneLinkNode(openChannels, pruneLinkNode)
+		err = deleteLinkNode(tx, remotePub)


not sure if it is worth it

My thinking is that there may be nodes in an anomalous state that have a channel, but not an active LinkNode.

Eg, on start up, we use this to determine who we ned to connect out to:

lnd/server.go

Lines 3489 to 3495 in 801de79

// Iterate through the list of LinkNodes to find addresses we should

// attempt to connect to based on our set of previous connections. Set

// the reconnection port to the default peer port.

linkNodes, err := s.chanStateDB.LinkNodeDB().FetchAllLinkNodes()

if err != nil && !errors.Is(err, channeldb.ErrLinkNodesNotFound) {

return fmt.Errorf("failed to fetch all link nodes: %w", err)

}

Roasbeef · 2026-01-05T22:42:20Z

channeldb/db.go

+
+		// Link node doesn't exist, so we need to create it.
+		// We create it within a transaction to ensure consistency.
+		err = kvdb.Update(c.backend, func(tx kvdb.RwTx) error {


Why not aggregate the DB transactions? WE can call into linkNodeDB in the same transaction. This'll speed up start up time, if we are calling this on start up each time. So the first loop can make a set of the nodes we're connected to and check if a link node exists or not. Then one db txn makes all the links nodes, exiting earlier if that's not needed.

good point did not pay attention that every linknodedb call was creating its own transaction. Now only 2 transactions are used. Also enhanced the linknodedb abstraction with the appropriate functions.

This commit fixes a critical race condition in MarkChanFullyClosed and pruneLinkNode where link nodes could be incorrectly deleted despite having pending or open channels. The race occurred because the check for open channels and the link node deletion happened in separate database transactions: Thread A: TX1 checks open channels → [] (empty) Thread A: TX1 commits Thread B: Opens new channel with same peer Thread A: TX2 deletes link node (using stale data) Result: Link node deleted despite pending channel existing This creates a TOCTOU (time-of-check to time-of-use) vulnerability where database state changes between reading the channel count and deleting the node. Fix for MarkChanFullyClosed: - Move link node deletion into the same transaction as the channel closing check, making the check-and-delete operation atomic Fix for pruneLinkNode: - Add double-check within the write transaction to verify no channels were opened since the caller's initial check - Maintains performance by keeping early return for common case - Prevents deletion if channels exist at delete time This ensures the invariant: "link node exists iff channels exist" is never violated, preventing database corruption and potential connection issues.

ziggie1984 · 2026-01-06T13:09:52Z

/gemini review

gemini-code-assist

Code Review

This pull request effectively addresses a critical race condition in link node pruning by ensuring that channel checks and node deletions occur within the same atomic database transaction. The introduction of RepairLinkNodes is a thoughtful addition to correct any database inconsistencies that may have resulted from the previous bug. The changes are well-structured, and the new tests are comprehensive. I have one suggestion to refactor a new function for improved readability.

channeldb/db.go

We make sure that nodes previously suffering from this error will have a consitent db view when restarting their node.

Roasbeef

LGTM 🥨

github-actions · 2026-01-07T07:56:09Z

Successfully created backport PR for v0.20.x-branch:

[v0.20.x-branch] Backport #10462: channeldb: fix race condition in link node pruning #10483

…20.x-branch [v0.20.x-branch] Backport #10462: channeldb: fix race condition in link node pruning

ziggie1984 self-assigned this Dec 22, 2025

ziggie1984 added channel closing Related to the closing of channels cooperatively and uncooperatively channels data-race Related to data-races labels Dec 22, 2025

ziggie1984 added this to the v0.20.1 milestone Dec 22, 2025

ziggie1984 added this to lnd v0.20 Dec 22, 2025

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

channeldb/db.go Outdated Show resolved Hide resolved

ziggie1984 added the no-changelog label Dec 22, 2025

ziggie1984 force-pushed the fix-prune-node-race branch from faf8cca to d249c83 Compare December 22, 2025 09:00

ziggie1984 marked this pull request as ready for review December 22, 2025 09:05

ziggie1984 force-pushed the fix-prune-node-race branch 2 times, most recently from 50cad71 to 1186e3e Compare December 22, 2025 09:22

ziggie1984 requested a review from Roasbeef December 22, 2025 09:24

Roasbeef reviewed Dec 23, 2025

View reviewed changes

yyforyongyu approved these changes Dec 24, 2025

View reviewed changes

ziggie1984 requested a review from Roasbeef January 5, 2026 19:47

ziggie1984 added the backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. label Jan 5, 2026

Roasbeef reviewed Jan 5, 2026

View reviewed changes

saubyk moved this to In review in lnd v0.20 Jan 6, 2026

ziggie1984 force-pushed the fix-prune-node-race branch from eb5c070 to 270fc8d Compare January 6, 2026 09:52

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

channeldb/db.go Show resolved Hide resolved

multi: make sure previous inconsitent states are fixed

d9fb909

We make sure that nodes previously suffering from this error will have a consitent db view when restarting their node.

ziggie1984 force-pushed the fix-prune-node-race branch from 270fc8d to d9fb909 Compare January 6, 2026 14:58

ziggie1984 requested a review from Roasbeef January 6, 2026 19:49

Roasbeef approved these changes Jan 7, 2026

View reviewed changes

ziggie1984 merged commit d5d151c into lightningnetwork:master Jan 7, 2026
41 checks passed

github-project-automation bot moved this from In review to Done in lnd v0.20 Jan 7, 2026

ziggie1984 deleted the fix-prune-node-race branch January 7, 2026 07:54

github-actions bot mentioned this pull request Jan 7, 2026

[v0.20.x-branch] Backport #10462: channeldb: fix race condition in link node pruning #10483

Merged

Roasbeef added a commit that referenced this pull request Jan 7, 2026

Merge pull request #10483 from lightningnetwork/backport-10462-to-v0.…

2cb9ff2

…20.x-branch [v0.20.x-branch] Backport #10462: channeldb: fix race condition in link node pruning

	// Iterate through the list of LinkNodes to find addresses we should
	// attempt to connect to based on our set of previous connections. Set
	// the reconnection port to the default peer port.
	linkNodes, err := s.chanStateDB.LinkNodeDB().FetchAllLinkNodes()
	if err != nil && !errors.Is(err, channeldb.ErrLinkNodesNotFound) {
	return fmt.Errorf("failed to fetch all link nodes: %w", err)
	}

channeldb: fix race condition in link node pruning #10462

channeldb: fix race condition in link node pruning #10462

Conversation

ziggie1984 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ziggie1984 commented Dec 22, 2025

Uh oh!

wildsats commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lightninglabs-deploy commented Dec 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziggie1984 commented Jan 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Roasbeef left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ziggie1984 commented Dec 22, 2025 •

edited

Loading